Skip to content

Conversation

AndreasArvidsson
Copy link
Member

@AndreasArvidsson AndreasArvidsson commented Jan 10, 2025

Today we generate all captures in the entire document whenever we do a Tree sitter query. This is of course quite expensive for large documents and it would be better if we could narrow the search range. That is what's implemented in this pull request.

I also found some incorrect scm patterns that broke with this implementation that I have fixed.

Fixes #1769

Checklist

  • [/] I have added tests
  • [/] I have updated the docs and cheatsheet
  • [/] I have not broken the cheatsheet

@AndreasArvidsson AndreasArvidsson requested a review from a team as a code owner January 10, 2025 13:54

;;!! aaa = 0;
(
(_
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a common ancestor for all captures. Nodes not intersecting with the search range will not be matched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more information
#1769 (comment)

@phillco
Copy link
Member

phillco commented Jan 10, 2025

Note to self:

  • We wanted to do this originally in 2023 but had to revert due to a tree-sitter bug, see Narrow search range for tree-sitter queries #1769
  • At some point that got fixed, query.matches function ignores top-level wildcard nodes when determining whether to include match tree-sitter/tree-sitter#2484 was closed a few weeks ago
  • This reintroduces the logic to specify a document text range when querying
  • For containing or iteration scope you get the greatest possible benefit because we know exactly the range we need for the containing scope and it will likely be quite small compared to the document size
    • (NB: smearing by one character exactly is pretty nifty)
  • This change also introduces a second optimization: if not using a containing scope, only searching forward or backwards, rather than the entire document since many expensive queries are often directional ("take next funk" without an iteration scope for example).

@phillco
Copy link
Member

phillco commented Jan 10, 2025

Possible future direction: We could make querying more incremental so it doesn't lock up with a gigantic query, but that's tricky because we don't necessarily know the tradeoff point. Multiple queries could be slower overall.

Some of this is limited by the fact that the actual grammar allows you to select an infinite amount of targets if you ask for it. So there's some limit to what we can do. We discussed the idea of a configuration default limit, but it's pretty hazy. I seem to recall least one person running into a problem where VS Code became extremely laggy after they accidentally asked for hundreds of thousands of selections. OTOH it's conceivable you could want to use this to do a gigantic refactor across the whole file?

…andlers/TreeSitterScopeHandler/getQuerySearchRange.ts
@AndreasArvidsson AndreasArvidsson added this pull request to the merge queue Jan 10, 2025
Merged via the queue into main with commit 9d90d16 Jan 10, 2025
15 checks passed
@AndreasArvidsson AndreasArvidsson deleted the treeSitterSearchRange branch January 10, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Narrow search range for tree-sitter queries

2 participants